Optimal transportation with infinitely many 

marginals* 



We formulate and study an optimal transportation problem with in- 
finitely many marginals; this is a natural extension of the multi-marginal 
problem studied by Gangbo and Swiech [14] . We prove results on the exis- 
tence, uniqueness and characterization of the optimizer, which are natural 
extensions of the results in [14] ■ The proof relies on a relationship between 
this problem and the problem of finding barycenters in the Wasserstein 
space, a connection first observed for finitely many marginals by Agueh 
and Carlier pTJ. 

1 Introduction 

In this paper, we study an optimal transportation problem with infinitely many 
marginals. 

Optimal transportation with two marginals is an exciting and fast moving 
area of research. The general goal is to couple two probability measures together 
as efficiently as possible, relative to a given cost function. More precisely, given 
measures /Ui and \xi (called marginals) on topological spaces Mi and M2, re- 
spectively, and a cost function c : Mi x Mi — > R, the aim is to find the measure 7 
on Mi x M2 which projects to fii and [12 and minimizes the total transportation 
cost: 



Equivalently, one can formulate this problem using more probabilistic lan- 
guage. Here one looks for an Mi x M2 valued random variable (Xi, X2), such 
that \&wXi — fj,i, for i = 1,2, which minimizes the expectation: 
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Results about the existence, uniqueness and structure of the optimal measure 7 
have been proven for a wide class of cost functions and marginals; for a detailed 
review, see the monograph of Villani [29] . A central theme is that, under certain 
conditions on the cost and the measures, there is a unique optimal measure 7, 
concentrated on the graph of a function, X2 — F(x±); this was first proven for 
the quadratic cost c(xi, x 2 ) — \xi — x 2 \ 2 on Mi = M% = E™ by Brenier [3] and 
was generalized to a large class of cost functions by Gangbo [12] , Gangbo and 
McCann [13], Caffarelli [5], McCann [2U] and Levin [15] . In probabilistic terms, 
this means that the random variables (Ai,X 2 ) are completely dependent. 

In recent years, optimal transportation problems with several marginals have 
started to attract more attention; this is a natural generalization of the preced- 
ing problem. Give m probability measures /ii,/i 2 , ■■•,A i m on topological spaces 
Mi, M 2 , M m , and a cost function c : M\ x M 2 x ... x M m — > R, we look for 
the measure 7 on the product Mi x M2 x ... x M m which projects to the \ii 
respectively, and minimizes 

/ c(xi,x 2 , ■■■,x m )d-y 

J Mi xMj x...xM m 

As in the two marginal case, this problem may be formulated probabilis- 
tically. In this setting, one looks for an Mi x M 2 x ... x M m valued random 
variable (A" 1; A 2 , X m ), such that law(Aj) = //j for i = 1, 2, m, minimizing 

£;[c(A 1 ,A 2 ,...,A m )] 

In contrast to the two marginal case, results concerning the structure of the 
optimal measure for m > 2 are rather scarce. However, Gangbo and Swiech 
proved that for the cost function c(xi, x%, ■■■X m ) — Ei=i Ejli \ x i ~ x j\ 2 011 
Mi = W l , the Kantorovich problem admits a unique solution which is concen- 
trated on the graph of a function over the first marginal, generalizing Brenier's 
theorem [14]; see also [22] [17] [26] and [27]. As in the two marginal case, this 
means that the random variables (X±, X2, Xm) are completely dependent. 
Since then, a handful of results have been proven on the structure of solutions 
for different cost functions by Heinich [T5], Carlier [B], Carlier and Nazaret 
[5] and the present author [25 24 . Applications for multi-marginal optimal 
transportation have also arisen in mathematical economics 7 9 and condensed 
matter physics [TT] [TUJ . 

Our goal in the present article is to study this problem in the limit as m — > 00, 
restricting our attention to a cost function reminiscent of that of Gangbo and 
Swiech. More precisely, we will prescribe a continuum of probability measures 
Hi on M n , for t G [0, 1] . We will then look for the measurable^ stochastic process, 
X t , with single time marginals law(A t ) = /i t , that minimizes 

x By definition, the stochastic process Xt = Xt{ui) is a mapping from f2 X [0,1] — ¥ R n , 
where £1 is a probability space. By measurable, we mean that this mapping is measurable, 
with respect to product measure on Q X [0,1]; by Fubini's theorem, this implies that the 
sample paths t 1— > Xt arc measurable almost surely. 
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E([ [ \X S -X t \ 2 dsdt) (MKoo) 
Jo Jo 

After expanding \X S — X t \ 2 and noting that, by Fubini's theorem, E(J^ Xfdt) = 

Jq E(Xf)dt = J J Rn x 2 d/it(x)dt, for any measurable process such that law At = 
lit for all t, it is clear that this is equivalent to maximizing: 

E(([ X t dtf) 
Jo 

We can think of the function J* f Q \X S — X t \ 2 dsdt as the limit of the Gangbo 

and Swiech cost. On the other hand, for a sample path X t , the integral J Q X t dt 
represents the average position of the sample path. If we think of X t as repre- 
senting a particle moving in a quadratic potential, over a time period t £ [0, 1], 
then (L X t dt) 2 is the potential of the average position of the particle. 

Our main result, Theorem l4.2.2[ asserts existence and uniqueness of an opti- 
mizer in (MKoo), as well as a characterization of it, and is the natural general- 
ization of the result of Gangbo and Swiech from finitely many to infinitely many 
marginals. Roughly speaking, it says that the random curve X t is completely 
dependent, or deterministic; if X to is known for one fixed to, then X t is known 
for all t (see Theorem 14.2. A\ . 

The typical approach to optimal transportation problems (with finitely many 
marginals) is to develop a duality theory, and then to use the resulting first order 
conditions to derive structural results about the optimal measure. Our strat- 
egy here is quite different. A recent paper of Agueh and Carlier relates the 
multi-marginal problem with Gangbo and Swiech's cost function to barycenters 
in the Wasserstein space 1.. In this paper, we first generalize their results on 
existence, uniqueness and regularity from barycenters of finitely many points 
to barycenters of curves. Having done this, we adapt their relationship be- 
tween barycenters and multi-marginal problems to our setting and then exploit 
this connection to deduce the existence and uniqueness of the solution to our 
problem. 

Barycenters in the Wasserstein space are an interesting topic in their own 
right. Barycenters of probability measures on general length spaces have at- 
tracted quite a bit of attention recently, in large part because of their relation- 
ship to curvature. In spaces with Alexandrov curvature bounded above, the 
behaviour of barycenters is fairly well understood; see the work of Sturm j^H] • 
The study of barycenters on spaces with curvature bounded below has recently 
been initiated by Ohta, and remains in its infancy [21] . It is, however, already 
apparent that barycenters on spaces with lower curvature bounds are not as well 
behaved as their counterparts on spaces with upper curvature bounds. In par- 
ticular, on spaces with non-positive curvature, each measure admits a unique 
barycenter, whereas on spaces with non-negative curvature, barycenters may 
be non-unique. As an elementary example, every point on the equator is a 
barycenter of the north and south pole on the unit sphere. 
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In addition, the problem of interpolating among several probability measures 
has begun to arise in applied problems including texture mixing 3^ and mathe- 
matical economics In fact, in [7], the authors also consider an extension of 
their model which involves interpolating among an infinite number of measures. 

It is well known that the Wasserstein space over W 1 does not have non- 
positive Alexandrov curvature [2J. The work of Agueh and Carlier provides 
uniqueness and regularity results, as well as a characterization of the barycen- 
ter of finitely many points in the Wasserstein space, under certain regularity 
conditions. Our first contribution is to extend their uniqueness and regularity 
result to a continuous curve [it of measures. It is also worth noting that our 
techniques here can be used to extend some of the results of Agueh and Carlier 
to other underlying spaces; in particular, we prove uniqueness of barycenters in 
the Wasserstein space over a Riemannian manifold. 

Finally, let us mention that in a separate paper, we study infinite marginal 
optimal transportation for somewhat more general cost functions, restricted to 
the case n = 1 [55]. The techniques used there are quite different than here. 

In the next section, we will introduce our hypotheses on the curve of mea- 
sures, [it, as well as two regularity assumptions which will be assumed only at 
specific points. In the third section we will study the barycenter of the curve /it 
proving existence, uniqueness and regularity, as well as demonstrating that the 
uniqueness result can be extended to other settings. In section 4, we develop 
the connection between barycenters and the problem (MKqo) and use this to 
prove existence and uniqueness of the optimal stochastic process. 

2 Notation and assumptions 

We will denote by P2(M. n ) the set of all probability measures on R n with finite 
second moments and P ac ,2(R n ) the subset of these which are absolutely contin- 
uous with respect to Lebesgue measure. For [i, v g P2(M. n ), Wi(p,,v) denotes 
the quadratic Wasserstein distance between the measures ji and v: 



where the infimum is taken over all Borel probability measures 7 on K™ x R™ 
projecting to [i and v, respectively. 

Let M C W 1 be a bounded domain and P[M) C P 2 (R n ) be the set of all 
Borel probability measures on M. We will assume that all of our measures \i t 
are supported on M; that is, fi t € P(M). We will denote by c(M) the convex 
hull of M and P(c(M)) the set of Borel probability measures on c(M). We will 
assume that fit is a weakly continuous curve in P(M); that is, we assume the 
mapping 1 1— > [l% is a continuous map with respect to the weak topology. Note 
that, by the boundedness of M, this is equivalent to continuity with respect to 
the Wasserstein metric. 

We now introduce two different regularity conditions on the fit, which we 
will assume at different times. 




4 



Assumption A. The set 

A := jt : fj, t — g t (x)dx is absolutely continuous with respect to Lebesgue measure, j 

has positive Lebesgue measure. 
Assumption B. The set 

Aoo := : fj, t = g t (x)dx is absolutely continuous with respect to Lebesgue measure and \ \gt\\L°° < ooj 

has positive Lebesgue measure. 

Note that assumption B easily implies that for some K < oo, the set 

Ak ■= |i : Mt = gt{x)dx is absolutely continuous with respect to Lebesgue measure and ||<7t||L°° _• ^ 

has positive Lebesgue measure. 

Finally, note that Assumption B clearly implies Assumption A. 

3 Bary centers 

In this section, we study the barycenter of the measures \i t . By definition, this 
is the minimizer of 



over the set P 2 (M. n ). In this section, we consider existence, uniqueness, and reg- 
ularity of the barycenter. We also prove generalizations for other distributions 
of measures and other underlying spaces. 

3.1 Existence of the barycenter 

Rather than proving the existence of a barycenter directly, we will, loosely 
speaking, approximate (Boo) by 




(Boo) 




(BN) 



i=l 



and take the limit as tends to infinity. This approach will prove useful later 
when we establish the regularity of the barycenter. 



Proposition 3.1.1. A barycenter (a measure fi e P 2 (M. n ) minimizing (Boo)) 
exists and it is supported on the convex hull of M. 
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Proof. The result of Agueh and Carlier implies the existence of a minimizer fx N 
for (BN), as this is simply the barycenter for the measures i = 1,...N, 

with equal weights. They also prove that it is supported on the set J2iLi ItfM, 
which is contained in the convex hull c(M). 
This yields, for all v G P 2 (K n ), 

AT N 
i=l i=l 

or, 

if:^,^)<if:^,,) 

i=i i=i 

Consider now the sequence fi , by Prokhorov's theorem and the tightness of 
the set P(c(M)), we can assume, up to extraction of a subsequence, that fi N 
converges weakly. This implies that fi N converges in the Wasserstein metric, 
and so, letting fi°° € P(c(M)) be the weak limit, W 2 (p N , fi°°) -> 0. 
Now, by the triangle inequality, for any v G P 2 (M. n ), we have 



N -, N 



1=1 4=1 

TV JV Af 

1— 1 i—1 2—1 

AT JV N 

2— 1 2 — 1 i — 1 

i=l i=l 

Now, as the compact set c(M) C l n is bounded and and fj, are supported 
on c(M), we have, for some M, \x — y\ 2 < M, whenever x Gspt(/i W ) and 
y Gspt(/i « ). Therefore 

and so the last term in inequality (jXJ) is bounded above by 

Now, as N — > oo, /i-" — > in the Wasserstein metric, and so the last two 
terms above tend to 0. As the curve t fit is continuous with respect to 
the Wasserstein distance, the mapping t i-> VF|(/x t ,/x°°) is continuous, by the 
triangle inequality. Therefore, the quantity on the left hand side of inequality 
Q tends to the Riemann integral of this curve as N tends to oo. A similar 



G 



conclusion holds for the first term on the right hand side, and so, taking the 
limit of as N — > oo in inequality ([T| yields: 



As this holds for any measure v G P2(M. n ), this means that is a barycenter. 



3.2 Uniqueness of the barycenter 

In this section, we establish uniqueness of the barycenter, under Assumption A. 

Lemma 3.2.1. Fix v e P 2 (M. n ). The function P 2 (R n ) 3/14 W 2 2 (v,fi)dt is 
convex on P2(M. n ). If v is absolutely continuous with respect to Lebesgue, it is 
strictly convex. 

Note that convexity here does not mean displacement convexity in the sense 
of McCann [20]; instead it means convexity with respect to the usual linear 
structure on the space of probability measures. This type of convexity is well 
known, and has been exploited in, for example, |16j . To the best of my knowl- 
edge, however, the strict convexity has not been explored. 

Proof. Choose two measures fj,o and fix in P2(M. n ). For a fixed t, let 7$ be 
optimal couplings between /ij and v, for i = 0, 1, respectively. Now, let /i s = 
sfii + (1 — s)^to and set 7 S = S71 + (1 — s)7o- Note that 7 S is a coupling of /i s 
and v. We then have 



This establishes convexity of the function /i 1— > W2 (v, A*) ■ We now show that 
this convexity is strict if v is absolutely continuous with respect to Lebesgue 
measure. In this case, Brenier's theorem implies the existence of an optimal 
map F s : spt{v) — > spt(/i s ) for each s, such that the unique optimal measure 77 
coupling v and fi s is concentrated on the graph {(x, F s (x)} [1]. 

Assume now that /l«o ^ fii and that < s < 1; we need to show that the 
inequality above is strict. Note first that the inequality is strict unless 7 S is an 
optimal coupling between v and ^x s . 

Now, as fio 7^ the set {x : Fq(x) ^ Fi(x)} has positive measure. Note 
that for each x where Fq (x) ^ F\ (x) , the coupling 7 S splits the mass at the point 
x between F (x) and Fi(x); for such x, both (x 7 F (x)) and (x 7 Fi(x)) belong 
to the support of the measure 7 S . On the other hand, the optimal measure 77 




□ 




sWi(iy,^) + (l- S )Wi(^ f i ) 



(2) 
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coupling v and \l s is concentrated on the graph of a function F s , and so, for v 
almost all x, there is only one point (x, y) in the support of the optimizer (namely 
(x, F s (x))). This immediately implies that j s is not the optimal coupling of v 
and n s and so we must have a strict inequality. This completes the proof. 

□ 

The preceding lemma easily implies the following result. 

Lemma 3.2.2. The function /i H> J Q W^ifJ-t, n)dt is convex on P2(W l ). If 
Assumption A is satisfied, the function is strictly convex. 

Proof. Let /io,A*i 6 /^(R™)- The preceding lemma implies that for all t £ [0, 1] 
and all s £ (0, 1) we have 

W^ tl ^ s ) < sWi(jH,tn) + (1 - s)W%(ii u ^) 

and the inequality is strict on a subset of [0, 1] of positive measure. Integrating 
with respect to t yields the desired result. □ 

This result immediately implies the uniqueness of the bary center. 

Corollary 3.2.3. Under regularity Assumption A, the barycenter is unique. 

3.3 Regularity of the barycenter 

In this subsection we obtain a regularity result on the barycenter which 
will be crucial to our construction on the optimal stochastic process in section 
4. Agueh and Carlier [T| proved the following regularity result for the barycenter 
of finitely many measures (ie, a minimizer of (BN)); assume that, for at least 
one i € {1, 2, 3, N}, the measure ju « is absolutely continuous with respect to 
Lebesgue measure, with an L°° density gi. Then the barycenter fi N is absolutely 
continuous with an L°° density g N and 

\\g N \\L~<N\\ gi \\ L ~ 

Our general strategy in this subsection is to approximate by barycenters 
H N of finitely many measures, much like in subsection 3.1, and then deduce a 
regularity result (ie, a bound on the L°° norm of the density) from the regularity 
of the fi N . Of course, the bound above tends to infinity as N tends to infinity 
and so to accomplish this goal we will need a refined regularity result on the 
barycenters of finitely many measures, with a bound on the L°° norm which is 
uniform in N. 

Proposition 3.3.1. Let /Ltj G P2GK™) f or i — l,2,....,iV and suppose fi mini- 
mizes /1 H> J2iLi on ■F2O&"'); where < < 1 and Yl^Li \ = !• 
Let B C {1,2,.., .N} be nonempty and assume that for all i £ B, /ii is absolutely 
continuous with respect to Lebesgue measure with an L°° density gi. Then fi is 
absolutely continuous with respect to Lebesgue measure with an L°° density g 
satisfying: 
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\9\\L" 



< 



E 



Proof. By a result of Agueh and Carlier, ([T], Proposition 3.8), for almost all 
x, 53i=i ^iDiii(x) = x, where Dui is the Brenier map pushing the Barycenter 
H forward to /ij. As each convex function Uj is twice differentiable almost 
everywhere, we can differentiate this equation to obtain 



N 



^ XjD 2 Ui(x) = I n 



for almost all x, where I n is the n x n identity matrix. Taking determinants 
and n-th roots yields: 



N 



[det^AiZ>V(af)]" = 1 



As each D 2 Ui{x) is symmetric and positive definite wherever it exists, Minkowski's 
determinant inequality combined with the preceding equation yields: 



JV 



N 



i=i 



i=l 



As each term Ai(det D 2 Ui(x)) ™ is non-negative, we obtain 

N 

X l {detD 2 u t (x))i < ^ A ( (det D 2 Ui {x)) » < 1. 
ies i=i 

Now, using the result of Agueh and Carlier, we know that is absolutely con- 
tinuous, ie d/j, = g{x)dx, and it is well known that for each i € B, Ui solves the 
Monge- Ampere equation almost everywhere, det D 2 m(x) = g ^^\x)) ■ Com- 
bined with the preceding inequality, this implies 



[»(*)]» < 



E 



g l (Du l (x))' 



< 



E 



Xi 



lies Hsfilli'oo J 



□ 



Lemma 3.3.2. Assume Assumption B and let tuk > be the Lebesgue mea- 
sure of the set Ak ■ Then there exists a sequence of measures ]J N , absolutely 
continuous with respect to Lebesgue measure, with densities ~g N {x) satisfying 
-, converging weakly to fi°° . 



I— iV 1 1 / K 
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Proof. For i = 0, 1, N — 1, set I t = [jj, *±i]. Let B K = {i : I t f\ A K ^ 0} be 
subset of indices i for which contains at least one point in the set Ak- The 
union {J ieBK li clearly covers , and so, denoting the size of by \B^\, we 
must have 

\Bk\ . 
— ~ mK - 

Now, we choose ti £ li and approximate the Riemann integral much like in 
the proof of existence, except that, whenever i is in Bk , we choose the point 
U 6 Jjfl Ak , rather than taking ti = j*. We define ~p to be the barycenter of 



the measures fj,^ , with equal weights Ai 
of: 



N 



that is the minimizer on P9 



') 



1 * 



Our result above implies that the barycenter fi N is absolutely continuous 
with respect to Lebesgue measure, with a density g N (x) satisfying 



I^IIl. 



< 



< 



< 



A, 



E 

ieB K \\gti\\'£° 

y - J - 

^ NKi 



\Bk\ 

N n K 

\B K \ n 
K 
m K 



Now, up to extraction of a subsequence, ~p N , converges weakly by Prokhorov's 
theorem to some measure Jf° . Exactly as in the proof of existence, one can 
prove that /Z 00 is a barycenter. It then follows by the uniqueness result in the 
last subsection that /I 00 = □ 

By approximation, we then easily obtain the following regularity result on 
our barycenter fi°°. 

Corollary 3.3.3. Under Assumption B, the barycenter is absolutely continuous 
with respect to Lebesgue measure and its density g°° (x) satisfies ||<?°°||i,<» < 



K 



Proof. It suffices to prove fi°°(A) < £-\A\ for any Borel set A C c(M). If A is 
open, this follows easily from Lemma 13.3.11 as the weak convergence ~p — > fjL°° 
implies 
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u°°(A) < \imMjI N (A) < —\A\. 

m K 

If A is not open, we may, for any e > 0, find an open set U such that A C U 
and \U \ A\ < e. Then we have 



rriK 

= 2Lqa\ + \U\A\) 
m K 

< \A\ + e 

Taking the limit as e — > yields the desired result. 

□ 



3.4 Generalization to other spaces and distributions 

The purpose of this subsection is to demonstrate that our approach to unique- 
ness of the barycenter holds for more general underlying spaces M and more 
general distributions of measures. The results of this subsection are not essential 
to the rest of the paper and can safely be skipped. 

For this subsection only, let (M, g) be a compact Riemannian manifold and 
P(M) denote the set of Borel probability measures on M. Given probability 
measures fi and v in P(M), the Wasserstein distance between fj, and v is defined 
as in Euclidean space, with the Riemannian distance squared replacing the 
Euclidean distance: 

W-K^) = inl / d 2 (x 1 y)d-y(x,y), 

J MxM 

where the infimum is over all measures 7 on M x M projecting to and v, 
respectively. 

Now, let r be a probability measure on P(M). A barycenter of T is a 
minimizer of 

A*h>. J Wi((x,v)dT(v). 

By continuity and Prokhorov's theorem, it is straightforward to verify that a 
barycenter exists; see, for example, |21j . We note here that our proof of unique- 
ness relied only on existence and uniqueness of Monge solutions for arbitrary /1 
and a set of v of positive T measure. Let P ac (M) be the set of Borel probability 
measures on M which are absolutely continuous with respect to local coordi- 
nates. By McCann's theorem [19], whenever v 6 P ac (M), there is a unique 
optimal map between v and jx. Therefore, we obtain: 
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Theorem 3.4.1. Suppose that P ac (M) C P(M) has positive Y measure. Then 
the barycenter 0/7 is unique. 

Assuming M is a bounded subset of R™, when Y has finite support, this 
yields the uniqueness theorem of Agueh and Carlier. When Y is supported on 
a Wasserstein continuous curve, we recover our results from a previous section. 

4 Infinitely many marginals 

4.1 Construction and basic properties of the optimal pro- 
cess 

We now return to our problem of primary present interest, namely the optimal 
transportation problem with infinitely many marginals, [MK^). 

We will use the barycenter from the previous section to construct a stochastic 
process X° pt . We will then show that this process is the unique minimizer in 
(MKoo). 

We construct our optimal process X° pt as follows. 

Definition 4.1.1. We take our underlying probability space to be W 1 , with the 
barycenter pL°° . Then taking Du t to be the Brenier map pushing fj,°° forward to 
[it, we define a stochastic process by 

X opt (x) = Dut(x), 

with the barycenter fi°° as the underlying probability space. 

Note that this definition means that the sample paths of the optimal process 
X° pt are t t-t Dut(x), for x G c(M), with a probability given by fx 00 . Recall 
that a stochastic process Y t is continuous in probability (or in measure) if, for 
all t G [0, 1] and all e > 0. 

limPfln -Y t \ > e) =0 

s— It 

To prove measurability of X° pt , we will need the following proposition. 

Proposition 4.1.2. The process X^ pt is continuous in probability. 

Proof. Recall that the path fj, t is weakly continuous; this then easily follows 
from a well known result on the stability of optimal transportation (see, for 
example, Villani [29], Corollary 5.23). □ 

Recall that a stochastic process Y t is a version of another process Z t if, 
for all t G [0, 1], Yt = Z t almost surely. In this case, we say that Y t and Z t 
axe stochastically equivalent. It is a well known result that every stochastic 
process which is continuous in probability has a measurable version, and so the 
preceding proposition implies: 

Corollary 4.1.3. The process X° pt has a measurable version. 

In light of the preceding corollary, we assume from now on that the process 
X° pt is measurable. 
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4.2 Proof of optimality 

Our aim is now to prove that the process X° pt defined in the last subsection 
is in fact optimal for (MK^). As a preliminary step in this direction, we will 
need to show that the average measure, defined by 



f X°t Pt dt) 
Jo 



= law( 

is in fact the barycenter /1 00 . 

Proposition 4.2.1. Assume Assumption B. Then, for the process X° pt from 
Definition ^. the average measure coincides with the barycenter: fj,° pt = . 

Proof. By Corollary I3.3.3[ the barycenter fj,°° is absolutely continuous with 
respect to Lebesgue measure. Now, for each t, the function 

restricted to the set P a c,2(R ra ) of absolutely continuous measures with finite 
second moments, is differentiable with respect to the Wasserstein structure on 
-Pac,2(K") [I]- This means that given a curve \i s in P a c,2(R n ) with /j,q = we 
have: 

A| s=0 Wf Gu s , Mt ) =2^ <y-Du t (y)My) > dn°°(y) 

where £, s (y) is a vector field satisfying ^jj-+D- (£t s £ s (y)) = 0; that is, the tangent 
to /i s in P aCj2 (IR™)- Note that we are abusing notation slightly by identifying 
the measure /i s with its density. 

Using the dominated convergence theorem, this means that 



is differentiable on P a c,2(K") as well, and so it's derivative must vanish at the 
minimizer, Using the formula for the derivative, we have, 

0= f [ <y~Du t (y),Uy)>dti 00 (y)dt 



for any tangent vector field £o 
By Fubini's theorem, 



0=/ ((/ y-Du t {y)dt),Uy))d^(y) (3) 

Note this holds for all tangent vector fields £o to P ac ,2(R") at As each ut 
is a convex function, the integral v(x) — J„ Ut(x)dt is also convex, and again 
using the dominated convergence theorem we have: 

Dv(x) = [ Du t {x)dt 
Jo 
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In particular, we can take £o(y) = V — f Du t (y)dt — J Q (y — Dut(y))dt in ([3]), 
to obtain 



= 



This implies 



((/ y-Du t {y)dt),n y-Du t (y)dt))d^{y) 



y-Du t {y)dt\ dn°°(y) 



0= f y-Du t (y)dt 
J o 



H°° almost everywhere. Therefore, y n- Du t (y)dt — L X° pt (y)dt is the 
identity mapping. As this map pushes forward to /x a , this immediately 
implies the desired result. 

□ 

Theorem 4.2.2. Assume Assumption B. Then the X° pt from Definition ^. 
is optimal for (MKoo). It is the unique optimizer in the sense that, ifY t is any 
other optimal process, we have for almost all t, X° pt = Y t , almost surely. 

Proof. It is clear from the construction that l&wX° pt — fi t for all t. Now, take 
any stochastic process Y t , such that law(Y" t ) = /i t . Denote by /j, a the law of the 
random variable Y a = J Y t dt. We will denote by \jut, a the law of the ordered 
pair (Y t ,Y a ) on K n xl"; note that this implies that \i t and fi a are the marginals 
of Ht,a- Now, note that: 



E Uo ^ Yt ~ So Ysds ^ dt 



= E 



E( [ \Y t \ 2 dt-2 [ Y t dt [ Y s d 
^ Jo Jo Jo 



Y s ds 



2 dt-\ / Y s ds\ 2 
Jo 

E\Y f \ 2 dt-E\ I Y,ds\ 2 



f f \Y t \ 2 dfi t dt-E\ f Y s ds\ 
Jo Jm™ Jo 



Note that the first term above depends only on law(Y~ t ) = fif. Therefore, max- 
imizing E(\ J Y s ds\ 2 ) subject to the constraint law(F t ) = fi t is equivalent to 

minimizing e( (|Y" t — J Q Y s ds\ 2 ^dt j , subject to the same constraint. 
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Now, we have, using Fubini, 

= J (E(\Y t -J Y s ds\ 2 )yt 
= 1(1 \Y t -Y a \ 2 d^ a )dt 

JO V JE"xl" ' 

> f Wi{^ t ^ a )dt 
Jo 

Jo 

Observe that we have equality if and only if 

1. fi a is the barycenter fi°° of the /it's and, 

2. for almost all t, the measure \it^ a is the optimal coupling between \it and 

Ma- 

Now, assuming the first condition, [i a — is absolutely continuous with 
respect to Lebesgue measure by Corollary 13.3.31 and so the optimal coupling 
/it, a is concentrated on the graph of the function x <— > Du t (x). Therefore, these 
two conditions imply that the sample path Y t is completely determined almost 
surely by Y a , which is distributed according to the barycenter. We can therefore 
take the underlying probability space to be fi°° and the second condition implies 
that the process Y t = X° pt almost surely, for almost all t. □ 

We can obtain a more elegant uniqueness result if we restrict our attention 
to stochastic processes which are continuous in probability; the following the- 
orem implies that X° pt is the unique, continuous in probability maximizer for 
(MA'oo), modulo stochastic equivalence. 

Theorem 4.2.3. Assume Assumption B and suppose Y t is optimal for (MKoa) 
and Y t is continuous in probability. Then Y t is a version of X° pt . 

Proof. From our previous uniqueness result, we know that Y t — X° pt almost 
surely, for almost all t. We need to prove this for all t. 

Fix to € [0, 1]. Then we can choose a sequence ti converging to to such that 
Y ti = X ti almost surely. By continuity in probability, Y ti converges to Y to in 
probability and X ti converges to X to in probability. This immediately implies 
Xt = Y to almost surely, as desired. □ 

We now prove an analogue of Brenier's theorem [3] (for two marginal prob- 
lems) and the result of Gangbo and Swiech [14] (for several marginals). In our 
context, it is natural to interpret this result as saying that we can take the 
underlying probability space of our stochastic process to be M C R™. 



E 



\Y t 



Y s ds\)dt 
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Theorem 4.2.4. (Monge solutions) Assume Assumption B holds and suppose 
IH £ Pac{M) . Then we can take jjb to to be the underlying probability space of 
the unique optimal process A t ° pt . That is, the optimal process can be written as 
X° pt — F t (X^ pt ), where, for each t, F t : E™ — > E™ is a mapping pushing fj, to 
forward to fi t , and F t „ is the identity mappings. 

Proof. As fJLt a does not charge small sets, Brenier's theorem implies that the 
optimal map Dut pushing the barycenter forward to Ht is invertible almost 
everywhere; its inverse is F)u* to , where u* tQ is the Legendre transform of u. We 
then have X° pt (x) = Du t {x) = Dv,t(Du% (z)), where x is distributed according 
to and z = Du to (x) is distributed according to \i ta ■ Taking F t = Du t o Du^ 
yields the desired result. □ 

This result means that the stochastic process A t op * is deterministic, in the 
sense that if we know X° pt , we know X opt for all t. 
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